Overview

Brought to you by YData

Dataset statistics

Number of variables19
Number of observations1048575
Missing cells0
Missing cells (%)0.0%
Duplicate rows25
Duplicate rows (%)< 0.1%
Total size in memory308.0 MiB
Average record size in memory308.0 B

Variable types

Categorical6
DateTime2
Numeric10
Boolean1

Alerts

Dataset has 25 (< 0.1%) duplicate rowsDuplicates
VendorID is highly overall correlated with extraHigh correlation
congestion_surcharge is highly overall correlated with improvement_surcharge and 2 other fieldsHigh correlation
extra is highly overall correlated with VendorIDHigh correlation
fare_amount is highly overall correlated with total_amount and 1 other fieldsHigh correlation
improvement_surcharge is highly overall correlated with congestion_surcharge and 1 other fieldsHigh correlation
mta_tax is highly overall correlated with congestion_surcharge and 1 other fieldsHigh correlation
tip_amount is highly overall correlated with total_amountHigh correlation
total_amount is highly overall correlated with congestion_surcharge and 3 other fieldsHigh correlation
trip_distance is highly overall correlated with fare_amount and 1 other fieldsHigh correlation
store_and_fwd_flag is highly imbalanced (96.3%) Imbalance
payment_type is highly imbalanced (57.8%) Imbalance
mta_tax is highly imbalanced (92.0%) Imbalance
improvement_surcharge is highly imbalanced (95.5%) Imbalance
congestion_surcharge is highly imbalanced (74.6%) Imbalance
Airport_fee is highly imbalanced (70.9%) Imbalance
trip_distance is highly skewed (γ1 = 772.1069164) Skewed
passenger_count has 11322 (1.1%) zeros Zeros
trip_distance has 15052 (1.4%) zeros Zeros
extra has 415665 (39.6%) zeros Zeros
tip_amount has 250931 (23.9%) zeros Zeros
tolls_amount has 966772 (92.2%) zeros Zeros

Reproduction

Analysis started2024-12-09 21:48:22.304585
Analysis finished2024-12-09 21:52:10.965520
Duration3 minutes and 48.66 seconds
Software versionydata-profiling vv4.12.1
Download configurationconfig.json

Variables

VendorID
Categorical

High correlation 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size50.0 MiB
2
795759 
1
252816 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1048575
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
2 795759
75.9%
1 252816
 
24.1%

Length

2024-12-09T14:52:11.277737image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-09T14:52:11.648905image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
2 795759
75.9%
1 252816
 
24.1%

Most occurring characters

ValueCountFrequency (%)
2 795759
75.9%
1 252816
 
24.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1048575
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2 795759
75.9%
1 252816
 
24.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1048575
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2 795759
75.9%
1 252816
 
24.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1048575
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2 795759
75.9%
1 252816
 
24.1%
Distinct17488
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size8.0 MiB
Minimum2002-12-31 22:59:00
Maximum2024-12-01 23:59:00
Invalid dates0
Invalid dates (%)0.0%
2024-12-09T14:52:12.120653image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:52:12.714332image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct17543
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size8.0 MiB
Minimum2002-12-31 23:05:00
Maximum2024-12-01 23:59:00
Invalid dates0
Invalid dates (%)0.0%
2024-12-09T14:52:13.243492image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:52:13.748537image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

passenger_count
Real number (ℝ)

Zeros 

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.3653024
Minimum0
Maximum8
Zeros11322
Zeros (%)1.1%
Negative0
Negative (%)0.0%
Memory size8.0 MiB
2024-12-09T14:52:14.197568image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile3
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.87298052
Coefficient of variation (CV)0.6394045
Kurtosis9.3016177
Mean1.3653024
Median Absolute Deviation (MAD)0
Skewness2.8581338
Sum1431622
Variance0.762095
MonotonicityNot monotonic
2024-12-09T14:52:14.633777image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
1 798851
76.2%
2 155991
 
14.9%
3 37704
 
3.6%
4 24078
 
2.3%
5 12466
 
1.2%
0 11322
 
1.1%
6 8132
 
0.8%
8 26
 
< 0.1%
7 5
 
< 0.1%
ValueCountFrequency (%)
0 11322
 
1.1%
1 798851
76.2%
2 155991
 
14.9%
3 37704
 
3.6%
4 24078
 
2.3%
5 12466
 
1.2%
6 8132
 
0.8%
7 5
 
< 0.1%
8 26
 
< 0.1%
ValueCountFrequency (%)
8 26
 
< 0.1%
7 5
 
< 0.1%
6 8132
 
0.8%
5 12466
 
1.2%
4 24078
 
2.3%
3 37704
 
3.6%
2 155991
 
14.9%
1 798851
76.2%
0 11322
 
1.1%

trip_distance
Real number (ℝ)

High correlation  Skewed  Zeros 

Distinct3774
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.4430206
Minimum0
Maximum10879.28
Zeros15052
Zeros (%)1.4%
Negative0
Negative (%)0.0%
Memory size8.0 MiB
2024-12-09T14:52:15.164580image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.46
Q11
median1.7
Q33.28
95-th percentile16.1
Maximum10879.28
Range10879.28
Interquartile range (IQR)2.28

Descriptive statistics

Standard deviation11.675558
Coefficient of variation (CV)3.3910799
Kurtosis718118.28
Mean3.4430206
Median Absolute Deviation (MAD)0.88
Skewness772.10692
Sum3610265.3
Variance136.31865
MonotonicityNot monotonic
2024-12-09T14:52:15.777856image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 15052
 
1.4%
0.9 14001
 
1.3%
0.8 13787
 
1.3%
1 13727
 
1.3%
1.1 13367
 
1.3%
0.7 13183
 
1.3%
1.2 12817
 
1.2%
1.3 12273
 
1.2%
1.4 11575
 
1.1%
0.6 11543
 
1.1%
Other values (3764) 917250
87.5%
ValueCountFrequency (%)
0 15052
1.4%
0.01 892
 
0.1%
0.02 630
 
0.1%
0.03 470
 
< 0.1%
0.04 373
 
< 0.1%
0.05 314
 
< 0.1%
0.06 254
 
< 0.1%
0.07 218
 
< 0.1%
0.08 211
 
< 0.1%
0.09 189
 
< 0.1%
ValueCountFrequency (%)
10879.28 1
< 0.1%
971.8 1
< 0.1%
964.6 1
< 0.1%
233.25 1
< 0.1%
210.82 1
< 0.1%
176.43 1
< 0.1%
142.62 1
< 0.1%
115.75 1
< 0.1%
111.57 1
< 0.1%
101.28 1
< 0.1%

RatecodeID
Real number (ℝ)

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.1558734
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.0 MiB
2024-12-09T14:52:16.273114image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum99
Range98
Interquartile range (IQR)0

Descriptive statistics

Standard deviation10.196364
Coefficient of variation (CV)4.7295743
Kurtosis86.064939
Mean2.1558734
Median Absolute Deviation (MAD)0
Skewness9.3760771
Sum2260595
Variance103.96583
MonotonicityNot monotonic
2024-12-09T14:52:16.684119image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
1 981137
93.6%
2 42416
 
4.0%
99 11476
 
1.1%
5 7689
 
0.7%
3 3373
 
0.3%
4 2483
 
0.2%
6 1
 
< 0.1%
ValueCountFrequency (%)
1 981137
93.6%
2 42416
 
4.0%
3 3373
 
0.3%
4 2483
 
0.2%
5 7689
 
0.7%
6 1
 
< 0.1%
99 11476
 
1.1%
ValueCountFrequency (%)
99 11476
 
1.1%
6 1
 
< 0.1%
5 7689
 
0.7%
4 2483
 
0.2%
3 3373
 
0.3%
2 42416
 
4.0%
1 981137
93.6%

store_and_fwd_flag
Boolean

Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.0 MiB
False
1044416 
True
 
4159
ValueCountFrequency (%)
False 1044416
99.6%
True 4159
 
0.4%
2024-12-09T14:52:17.039538image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

PULocationID
Real number (ℝ)

Distinct247
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean165.18883
Minimum1
Maximum265
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.0 MiB
2024-12-09T14:52:17.464403image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile48
Q1132
median161
Q3233
95-th percentile249
Maximum265
Range264
Interquartile range (IQR)101

Descriptive statistics

Standard deviation63.082007
Coefficient of variation (CV)0.38187817
Kurtosis-0.81467577
Mean165.18883
Median Absolute Deviation (MAD)54
Skewness-0.24910261
Sum1.7321287 × 108
Variance3979.3396
MonotonicityNot monotonic
2024-12-09T14:52:17.999711image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
132 62088
 
5.9%
161 50493
 
4.8%
237 48796
 
4.7%
236 47161
 
4.5%
142 39224
 
3.7%
186 38951
 
3.7%
230 38121
 
3.6%
162 37813
 
3.6%
138 34626
 
3.3%
239 32117
 
3.1%
Other values (237) 619185
59.1%
ValueCountFrequency (%)
1 144
 
< 0.1%
2 2
 
< 0.1%
3 33
 
< 0.1%
4 837
0.1%
6 9
 
< 0.1%
7 515
< 0.1%
8 5
 
< 0.1%
9 19
 
< 0.1%
10 373
< 0.1%
11 18
 
< 0.1%
ValueCountFrequency (%)
265 609
 
0.1%
264 3718
 
0.4%
263 20364
1.9%
262 14029
1.3%
261 4958
 
0.5%
260 263
 
< 0.1%
259 36
 
< 0.1%
258 48
 
< 0.1%
257 33
 
< 0.1%
256 184
 
< 0.1%

DOLocationID
Real number (ℝ)

Distinct260
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean164.62893
Minimum1
Maximum265
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.0 MiB
2024-12-09T14:52:18.550351image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile43
Q1114
median162
Q3234
95-th percentile261
Maximum265
Range264
Interquartile range (IQR)120

Descriptive statistics

Standard deviation69.495842
Coefficient of variation (CV)0.42213626
Kurtosis-0.90044768
Mean164.62893
Median Absolute Deviation (MAD)68
Skewness-0.37621195
Sum1.7262578 × 108
Variance4829.6721
MonotonicityNot monotonic
2024-12-09T14:52:19.069263image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
236 49431
 
4.7%
237 43945
 
4.2%
161 40372
 
3.9%
239 33436
 
3.2%
230 33201
 
3.2%
142 33192
 
3.2%
170 30470
 
2.9%
162 29932
 
2.9%
141 29417
 
2.8%
238 27882
 
2.7%
Other values (250) 697297
66.5%
ValueCountFrequency (%)
1 2966
0.3%
2 1
 
< 0.1%
3 95
 
< 0.1%
4 3751
0.4%
5 6
 
< 0.1%
6 25
 
< 0.1%
7 2931
0.3%
8 17
 
< 0.1%
9 114
 
< 0.1%
10 1106
 
0.1%
ValueCountFrequency (%)
265 4667
 
0.4%
264 5793
 
0.6%
263 22464
2.1%
262 16885
1.6%
261 4659
 
0.4%
260 864
 
0.1%
259 145
 
< 0.1%
258 278
 
< 0.1%
257 433
 
< 0.1%
256 1950
 
0.2%

payment_type
Categorical

Imbalance 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size50.0 MiB
1
842317 
2
180389 
4
 
18252
3
 
7617

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1048575
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 842317
80.3%
2 180389
 
17.2%
4 18252
 
1.7%
3 7617
 
0.7%

Length

2024-12-09T14:52:19.597693image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-09T14:52:19.995299image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
1 842317
80.3%
2 180389
 
17.2%
4 18252
 
1.7%
3 7617
 
0.7%

Most occurring characters

ValueCountFrequency (%)
1 842317
80.3%
2 180389
 
17.2%
4 18252
 
1.7%
3 7617
 
0.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1048575
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 842317
80.3%
2 180389
 
17.2%
4 18252
 
1.7%
3 7617
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1048575
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 842317
80.3%
2 180389
 
17.2%
4 18252
 
1.7%
3 7617
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1048575
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 842317
80.3%
2 180389
 
17.2%
4 18252
 
1.7%
3 7617
 
0.7%

fare_amount
Real number (ℝ)

High correlation 

Distinct1936
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18.590696
Minimum-700
Maximum1616.5
Zeros307
Zeros (%)< 0.1%
Negative13716
Negative (%)1.3%
Memory size8.0 MiB
2024-12-09T14:52:20.436474image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum-700
5-th percentile5.1
Q18.6
median12.8
Q320.5
95-th percentile70
Maximum1616.5
Range2316.5
Interquartile range (IQR)11.9

Descriptive statistics

Standard deviation19.313701
Coefficient of variation (CV)1.0388907
Kurtosis88.010266
Mean18.590696
Median Absolute Deviation (MAD)4.9
Skewness3.5970617
Sum19493739
Variance373.01904
MonotonicityNot monotonic
2024-12-09T14:52:20.987479image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.6 51074
 
4.9%
7.9 50880
 
4.9%
9.3 50373
 
4.8%
10 49326
 
4.7%
7.2 48839
 
4.7%
10.7 46291
 
4.4%
11.4 43798
 
4.2%
6.5 43452
 
4.1%
70 41686
 
4.0%
12.1 40744
 
3.9%
Other values (1926) 582112
55.5%
ValueCountFrequency (%)
-700 1
< 0.1%
-600 1
< 0.1%
-509.8 1
< 0.1%
-439.1 1
< 0.1%
-423 1
< 0.1%
-404.1 1
< 0.1%
-367.7 1
< 0.1%
-367 1
< 0.1%
-351.6 1
< 0.1%
-300 1
< 0.1%
ValueCountFrequency (%)
1616.5 1
< 0.1%
912.3 1
< 0.1%
820 1
< 0.1%
700 1
< 0.1%
678.5 1
< 0.1%
620.4 1
< 0.1%
600 1
< 0.1%
536.4 1
< 0.1%
530.8 1
< 0.1%
520.08 1
< 0.1%

extra
Real number (ℝ)

High correlation  Zeros 

Distinct34
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.578604
Minimum-7.5
Maximum11.75
Zeros415665
Zeros (%)39.6%
Negative7102
Negative (%)0.7%
Memory size8.0 MiB
2024-12-09T14:52:21.465137image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum-7.5
5-th percentile0
Q10
median1
Q32.5
95-th percentile5
Maximum11.75
Range19.25
Interquartile range (IQR)2.5

Descriptive statistics

Standard deviation1.8518877
Coefficient of variation (CV)1.1731173
Kurtosis2.4729546
Mean1.578604
Median Absolute Deviation (MAD)1
Skewness1.2906061
Sum1655284.7
Variance3.4294881
MonotonicityNot monotonic
2024-12-09T14:52:21.925212image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=34)
ValueCountFrequency (%)
0 415665
39.6%
2.5 271188
25.9%
1 189948
18.1%
5 78415
 
7.5%
3.5 51361
 
4.9%
7.5 9107
 
0.9%
6 8664
 
0.8%
9.25 4254
 
0.4%
-1 4027
 
0.4%
4.25 4011
 
0.4%
Other values (24) 11935
 
1.1%
ValueCountFrequency (%)
-7.5 87
 
< 0.1%
-6 129
 
< 0.1%
-5 487
 
< 0.1%
-2.5 2371
 
0.2%
-1.5 1
 
< 0.1%
-1 4027
 
0.4%
0 415665
39.6%
0.01 2
 
< 0.1%
0.02 1
 
< 0.1%
0.06 2
 
< 0.1%
ValueCountFrequency (%)
11.75 1042
 
0.1%
10.25 1037
 
0.1%
10 249
 
< 0.1%
9.95 1
 
< 0.1%
9.25 4254
0.4%
8.5 149
 
< 0.1%
7.75 888
 
0.1%
7.5 9107
0.9%
6.75 1590
 
0.2%
6 8664
0.8%

mta_tax
Categorical

High correlation  Imbalance 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size52.0 MiB
0.5
1023566 
-0.5
 
13348
0.0
 
11659
1.6
 
1
0.8
 
1

Length

Max length4
Median length3
Mean length3.0127297
Min length3

Characters and Unicode

Total characters3159073
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row0.5
2nd row0.5
3rd row0.5
4th row0.5
5th row0.5

Common Values

ValueCountFrequency (%)
0.5 1023566
97.6%
-0.5 13348
 
1.3%
0.0 11659
 
1.1%
1.6 1
 
< 0.1%
0.8 1
 
< 0.1%

Length

2024-12-09T14:52:22.431745image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-09T14:52:23.227808image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
0.5 1036914
98.9%
0.0 11659
 
1.1%
1.6 1
 
< 0.1%
0.8 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 1060233
33.6%
. 1048575
33.2%
5 1036914
32.8%
- 13348
 
0.4%
1 1
 
< 0.1%
6 1
 
< 0.1%
8 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3159073
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 1060233
33.6%
. 1048575
33.2%
5 1036914
32.8%
- 13348
 
0.4%
1 1
 
< 0.1%
6 1
 
< 0.1%
8 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3159073
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 1060233
33.6%
. 1048575
33.2%
5 1036914
32.8%
- 13348
 
0.4%
1 1
 
< 0.1%
6 1
 
< 0.1%
8 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3159073
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 1060233
33.6%
. 1048575
33.2%
5 1036914
32.8%
- 13348
 
0.4%
1 1
 
< 0.1%
6 1
 
< 0.1%
8 1
 
< 0.1%

tip_amount
Real number (ℝ)

High correlation  Zeros 

Distinct3374
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.4163597
Minimum-80
Maximum422.7
Zeros250931
Zeros (%)23.9%
Negative48
Negative (%)< 0.1%
Memory size8.0 MiB
2024-12-09T14:52:23.745496image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum-80
5-th percentile0
Q11
median2.72
Q34.2
95-th percentile11.77
Maximum422.7
Range502.7
Interquartile range (IQR)3.2

Descriptive statistics

Standard deviation4.0528675
Coefficient of variation (CV)1.1863117
Kurtosis218.30725
Mean3.4163597
Median Absolute Deviation (MAD)1.72
Skewness5.5523353
Sum3582309.3
Variance16.425735
MonotonicityNot monotonic
2024-12-09T14:52:24.298522image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 250931
23.9%
2 51527
 
4.9%
1 39707
 
3.8%
3 26627
 
2.5%
5 15154
 
1.4%
2.8 13342
 
1.3%
4 11804
 
1.1%
3.5 11769
 
1.1%
2.1 11225
 
1.1%
1.5 11189
 
1.1%
Other values (3364) 605300
57.7%
ValueCountFrequency (%)
-80 1
< 0.1%
-65.1 1
< 0.1%
-22.24 1
< 0.1%
-22 1
< 0.1%
-17.59 1
< 0.1%
-16.19 2
< 0.1%
-8.18 1
< 0.1%
-6.65 1
< 0.1%
-3.36 1
< 0.1%
-3 2
< 0.1%
ValueCountFrequency (%)
422.7 1
< 0.1%
303 1
< 0.1%
300 1
< 0.1%
280 1
< 0.1%
144 1
< 0.1%
140 1
< 0.1%
130 1
< 0.1%
110 1
< 0.1%
104 1
< 0.1%
103.65 1
< 0.1%

tolls_amount
Real number (ℝ)

Zeros 

Distinct700
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.57740556
Minimum-60
Maximum101.69
Zeros966772
Zeros (%)92.2%
Negative866
Negative (%)0.1%
Memory size8.0 MiB
2024-12-09T14:52:24.824113image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum-60
5-th percentile0
Q10
median0
Q30
95-th percentile6.94
Maximum101.69
Range161.69
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.2212313
Coefficient of variation (CV)3.8469172
Kurtosis56.889985
Mean0.57740556
Median Absolute Deviation (MAD)0
Skewness5.004907
Sum605453.03
Variance4.9338687
MonotonicityNot monotonic
2024-12-09T14:52:25.407426image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 966772
92.2%
6.94 74237
 
7.1%
12.75 784
 
0.1%
-6.94 707
 
0.1%
3.18 554
 
0.1%
14.75 510
 
< 0.1%
13.88 426
 
< 0.1%
13.38 412
 
< 0.1%
15.38 236
 
< 0.1%
5.2 138
 
< 0.1%
Other values (690) 3799
 
0.4%
ValueCountFrequency (%)
-60 1
< 0.1%
-55.34 1
< 0.1%
-54.02 1
< 0.1%
-52.57 1
< 0.1%
-45 1
< 0.1%
-42.75 1
< 0.1%
-40 1
< 0.1%
-39.38 1
< 0.1%
-38.02 1
< 0.1%
-32.75 1
< 0.1%
ValueCountFrequency (%)
101.69 1
 
< 0.1%
87 1
 
< 0.1%
83 1
 
< 0.1%
81 1
 
< 0.1%
80 4
< 0.1%
62.75 1
 
< 0.1%
60 1
 
< 0.1%
58.63 1
 
< 0.1%
57.32 1
 
< 0.1%
55.55 1
 
< 0.1%

improvement_surcharge
Categorical

High correlation  Imbalance 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size52.0 MiB
1.0
1034457 
-1.0
 
13756
0.0
 
263
0.3
 
98
-0.3
 
1

Length

Max length4
Median length3
Mean length3.0131197
Min length3

Characters and Unicode

Total characters3159482
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0 1034457
98.7%
-1.0 13756
 
1.3%
0.0 263
 
< 0.1%
0.3 98
 
< 0.1%
-0.3 1
 
< 0.1%

Length

2024-12-09T14:52:25.935654image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-09T14:52:26.327161image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
1.0 1048213
> 99.9%
0.0 263
 
< 0.1%
0.3 99
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 1048838
33.2%
. 1048575
33.2%
1 1048213
33.2%
- 13757
 
0.4%
3 99
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3159482
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 1048838
33.2%
. 1048575
33.2%
1 1048213
33.2%
- 13757
 
0.4%
3 99
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3159482
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 1048838
33.2%
. 1048575
33.2%
1 1048213
33.2%
- 13757
 
0.4%
3 99
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3159482
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 1048838
33.2%
. 1048575
33.2%
1 1048213
33.2%
- 13757
 
0.4%
3 99
 
< 0.1%

total_amount
Real number (ℝ)

High correlation 

Distinct13721
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean27.445239
Minimum-695.75
Maximum1617.5
Zeros154
Zeros (%)< 0.1%
Negative13757
Negative (%)1.3%
Memory size8.0 MiB
2024-12-09T14:52:26.799711image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum-695.75
5-th percentile10.8
Q115.3
median20.02
Q329
95-th percentile82.79
Maximum1617.5
Range2313.25
Interquartile range (IQR)13.7

Descriptive statistics

Standard deviation24.051793
Coefficient of variation (CV)0.87635574
Kurtosis43.181553
Mean27.445239
Median Absolute Deviation (MAD)5.74
Skewness2.9266258
Sum28778391
Variance578.48873
MonotonicityNot monotonic
2024-12-09T14:52:27.423107image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
16.8 16042
 
1.5%
12.6 15513
 
1.5%
21 12865
 
1.2%
15.96 9150
 
0.9%
15.12 9083
 
0.9%
14.28 8956
 
0.9%
17.64 8423
 
0.8%
18.48 8266
 
0.8%
14 8166
 
0.8%
13.44 8134
 
0.8%
Other values (13711) 943977
90.0%
ValueCountFrequency (%)
-695.75 1
< 0.1%
-591 1
< 0.1%
-464.67 1
< 0.1%
-426.54 1
< 0.1%
-416.34 1
< 0.1%
-415.75 1
< 0.1%
-396.2 1
< 0.1%
-374.04 1
< 0.1%
-369.75 1
< 0.1%
-315.47 1
< 0.1%
ValueCountFrequency (%)
1617.5 1
< 0.1%
940.93 1
< 0.1%
821 1
< 0.1%
715.75 1
< 0.1%
696 1
< 0.1%
630.09 1
< 0.1%
601 1
< 0.1%
586.6 1
< 0.1%
560.18 1
< 0.1%
551.59 1
< 0.1%

congestion_surcharge
Categorical

High correlation  Imbalance 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size52.0 MiB
2.5
946669 
0.0
 
90757
-2.5
 
11148
0.75
 
1

Length

Max length4
Median length3
Mean length3.0106325
Min length3

Characters and Unicode

Total characters3156874
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row2.5
2nd row2.5
3rd row2.5
4th row2.5
5th row2.5

Common Values

ValueCountFrequency (%)
2.5 946669
90.3%
0.0 90757
 
8.7%
-2.5 11148
 
1.1%
0.75 1
 
< 0.1%

Length

2024-12-09T14:52:27.918068image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-09T14:52:28.314831image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
2.5 957817
91.3%
0.0 90757
 
8.7%
0.75 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
. 1048575
33.2%
5 957818
30.3%
2 957817
30.3%
0 181515
 
5.7%
- 11148
 
0.4%
7 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3156874
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
. 1048575
33.2%
5 957818
30.3%
2 957817
30.3%
0 181515
 
5.7%
- 11148
 
0.4%
7 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3156874
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
. 1048575
33.2%
5 957818
30.3%
2 957817
30.3%
0 181515
 
5.7%
- 11148
 
0.4%
7 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3156874
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
. 1048575
33.2%
5 957818
30.3%
2 957817
30.3%
0 181515
 
5.7%
- 11148
 
0.4%
7 1
 
< 0.1%

Airport_fee
Categorical

Imbalance 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size52.1 MiB
0.0
950470 
1.75
96164 
-1.75
 
1941

Length

Max length5
Median length3
Mean length3.0954114
Min length3

Characters and Unicode

Total characters3245771
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0 950470
90.6%
1.75 96164
 
9.2%
-1.75 1941
 
0.2%

Length

2024-12-09T14:52:28.846087image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-09T14:52:29.338180image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
0.0 950470
90.6%
1.75 98105
 
9.4%

Most occurring characters

ValueCountFrequency (%)
0 1900940
58.6%
. 1048575
32.3%
1 98105
 
3.0%
7 98105
 
3.0%
5 98105
 
3.0%
- 1941
 
0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3245771
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 1900940
58.6%
. 1048575
32.3%
1 98105
 
3.0%
7 98105
 
3.0%
5 98105
 
3.0%
- 1941
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3245771
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 1900940
58.6%
. 1048575
32.3%
1 98105
 
3.0%
7 98105
 
3.0%
5 98105
 
3.0%
- 1941
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3245771
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 1900940
58.6%
. 1048575
32.3%
1 98105
 
3.0%
7 98105
 
3.0%
5 98105
 
3.0%
- 1941
 
0.1%

Interactions

2024-12-09T14:51:52.512703image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:43.485042image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:50.855403image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:58.130840image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:05.700907image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:13.594044image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:21.290467image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:29.263066image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:37.012409image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:44.460291image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:53.593549image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:44.325739image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:51.576774image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:58.970164image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:06.428825image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:14.422586image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:22.099364image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:30.087630image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:37.787242image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:45.298789image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:54.316695image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:45.052626image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:52.244282image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:59.738506image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:07.186238image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:15.149018image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:22.872139image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:30.915484image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:38.523112image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:46.038063image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:54.978352image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:45.858629image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:52.929082image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:00.577349image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:08.278281image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:15.882882image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:23.605377image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:31.724170image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:39.311878image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:46.792592image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:55.639330image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:46.571794image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:53.660973image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:01.329041image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:09.000659image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:16.574896image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:24.426333image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:32.499548image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:40.102768image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:47.602309image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:56.284170image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:47.278075image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:54.408173image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:02.042157image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:09.749611image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:17.209852image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:25.201671image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:33.296944image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:40.866606image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:48.419306image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:56.938836image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:47.948984image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:55.111094image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:02.771818image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:10.519674image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:17.736850image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:25.939152image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:34.020257image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:41.636863image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:49.217263image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:57.662074image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:48.700045image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:55.891148image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:03.570279image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:11.253372image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:18.377490image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:26.792651image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:34.817883image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:42.401075image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:50.006272image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:58.321980image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:49.438641image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:56.638620image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:04.272561image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:12.011965image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:19.435037image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:27.683807image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:35.553308image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:43.117393image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:50.837089image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:58.999220image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:50.165379image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:50:57.437916image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:04.983700image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:12.829387image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:20.526102image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:28.470159image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:36.365259image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:43.778504image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-12-09T14:51:51.753863image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Correlations

2024-12-09T14:52:29.669658image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Airport_feeDOLocationIDPULocationIDRatecodeIDVendorIDcongestion_surchargeextrafare_amountimprovement_surchargemta_taxpassenger_countpayment_typestore_and_fwd_flagtip_amounttolls_amounttotal_amounttrip_distance
Airport_fee1.0000.0740.3960.0340.0560.3210.4800.2520.2650.2570.0410.1490.0060.0730.3950.2700.003
DOLocationID0.0741.0000.085-0.0550.0110.125-0.001-0.1030.0140.087-0.0120.0360.004-0.006-0.050-0.091-0.102
PULocationID0.3960.0851.000-0.1410.0370.181-0.037-0.1650.0130.023-0.0220.0280.004-0.049-0.148-0.157-0.167
RatecodeID0.034-0.055-0.1411.0000.1870.342-0.1210.3740.0160.0180.0730.0520.0060.0930.4920.3560.283
VendorID0.0560.0110.0370.1871.0000.0730.5970.0600.0660.0640.2000.0610.1000.0040.0240.0650.000
congestion_surcharge0.3210.1250.1810.3420.0731.0000.2730.4970.5210.5420.0180.3040.0070.0660.0880.5230.000
extra0.480-0.001-0.037-0.1210.5970.2731.0000.0880.2420.245-0.0360.1610.0650.1470.1510.1850.103
fare_amount0.252-0.103-0.1650.3740.0600.4970.0881.0000.4590.4540.0660.3040.0060.4410.4330.9650.891
improvement_surcharge0.2650.0140.0130.0160.0660.5210.2420.4591.0000.5010.0220.3280.0290.0070.0470.5000.000
mta_tax0.2570.0870.0230.0180.0640.5420.2450.4540.5011.0000.0360.3230.0070.1070.1630.4980.000
passenger_count0.041-0.012-0.0220.0730.2000.018-0.0360.0660.0220.0361.0000.0390.0460.0140.0700.0630.054
payment_type0.1490.0360.0280.0520.0610.3040.1610.3040.3280.3230.0391.0000.0050.0200.0340.3270.000
store_and_fwd_flag0.0060.0040.0040.0060.1000.0070.0650.0060.0290.0070.0460.0051.0000.0030.0070.0070.000
tip_amount0.073-0.006-0.0490.0930.0040.0660.1470.4410.0070.1070.0140.0200.0031.0000.2520.5800.411
tolls_amount0.395-0.050-0.1480.4920.0240.0880.1510.4330.0470.1630.0700.0340.0070.2521.0000.4450.412
total_amount0.270-0.091-0.1570.3560.0650.5230.1850.9650.5000.4980.0630.3270.0070.5800.4451.0000.866
trip_distance0.003-0.102-0.1670.2830.0000.0000.1030.8910.0000.0000.0540.0000.0000.4110.4120.8661.000

Missing values

2024-12-09T14:51:59.916105image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
A simple visualization of nullity by column.
2024-12-09T14:52:03.595990image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

VendorIDtpep_pickup_datetimetpep_dropoff_datetimepassenger_counttrip_distanceRatecodeIDstore_and_fwd_flagPULocationIDDOLocationIDpayment_typefare_amountextramta_taxtip_amounttolls_amountimprovement_surchargetotal_amountcongestion_surchargeAirport_fee
0201-01-2024 00:5701-01-2024 01:1711.721N18679217.71.00.50.000.01.022.702.50.00
1101-01-2024 00:0301-01-2024 00:0911.801N140236110.03.50.53.750.01.018.752.50.00
2101-01-2024 00:1701-01-2024 00:3514.701N23679123.33.50.53.000.01.031.302.50.00
3101-01-2024 00:3601-01-2024 00:4411.401N79211110.03.50.52.000.01.017.002.50.00
4101-01-2024 00:4601-01-2024 00:5210.801N21114817.93.50.53.200.01.016.102.50.00
5101-01-2024 00:5401-01-2024 01:2614.701N148141129.63.50.56.900.01.041.502.50.00
6201-01-2024 00:4901-01-2024 01:15210.821N138181145.76.00.510.000.01.064.950.01.75
7101-01-2024 00:3001-01-2024 00:5803.001N246231225.43.50.50.000.01.030.402.50.00
8201-01-2024 00:2601-01-2024 00:5415.441N161261231.01.00.50.000.01.036.002.50.00
9201-01-2024 00:2801-01-2024 00:2910.041N11311323.01.00.50.000.01.08.002.50.00
VendorIDtpep_pickup_datetimetpep_dropoff_datetimepassenger_counttrip_distanceRatecodeIDstore_and_fwd_flagPULocationIDDOLocationIDpayment_typefare_amountextramta_taxtip_amounttolls_amountimprovement_surchargetotal_amountcongestion_surchargeAirport_fee
1048565213-01-2024 03:0813-01-2024 03:2062.891N211233114.91.00.51.000.01.020.902.50.0
1048566213-01-2024 03:3313-01-2024 03:4963.441N148246119.11.00.54.820.01.028.922.50.0
1048567213-01-2024 03:2013-01-2024 03:2310.971N9012516.51.00.52.300.01.013.802.50.0
1048568213-01-2024 03:4413-01-2024 03:4810.771N12524916.51.00.51.750.01.013.252.50.0
1048569213-01-2024 03:0513-01-2024 03:1712.521N7968114.21.00.53.000.01.022.202.50.0
1048570213-01-2024 03:2313-01-2024 03:2811.041N2464817.21.00.52.440.01.014.642.50.0
1048571213-01-2024 03:4113-01-2024 03:4310.821N2465015.81.00.52.700.01.013.502.50.0
1048572213-01-2024 03:4913-01-2024 03:5210.891N2464816.51.00.52.300.01.013.802.50.0
1048573213-01-2024 03:2413-01-2024 03:3623.631N114141117.01.00.52.000.01.024.002.50.0
1048574213-01-2024 03:5213-01-2024 04:1818.271N164188136.61.00.510.400.01.052.002.50.0

Duplicate rows

Most frequently occurring

VendorIDtpep_pickup_datetimetpep_dropoff_datetimepassenger_counttrip_distanceRatecodeIDstore_and_fwd_flagPULocationIDDOLocationIDpayment_typefare_amountextramta_taxtip_amounttolls_amountimprovement_surchargetotal_amountcongestion_surchargeAirport_fee# duplicates
5104-01-2024 11:1404-01-2024 11:1410.01N19319323.00.000.50.00.01.04.500.00.003
0102-01-2024 08:2502-01-2024 08:2510.01N14514523.00.000.50.00.01.04.500.00.002
1102-01-2024 11:2502-01-2024 11:2510.01N23626423.02.500.50.00.01.07.002.50.002
2102-01-2024 14:4302-01-2024 14:4310.01N11411433.02.500.50.00.01.07.002.50.002
3103-01-2024 08:4603-01-2024 08:4610.01N13726423.02.500.50.00.01.07.002.50.002
4103-01-2024 14:1103-01-2024 14:1120.01N19319323.00.000.50.00.01.04.500.00.002
6104-01-2024 13:3404-01-2024 13:3420.01N19319323.00.000.50.00.01.04.500.00.002
7104-01-2024 15:2304-01-2024 15:2310.01N13213233.01.750.50.00.01.06.250.01.752
8106-01-2024 11:5906-01-2024 11:5900.01N14514523.00.000.50.00.01.04.500.00.002
9107-01-2024 15:4007-01-2024 15:4010.01N393933.00.000.50.00.01.04.500.00.002